score 6
R1: Score 6 (confidence 3)
We thank the reviewers for their time and feedback. Q2: "What is the fundamental difference between converting whole network vs only the last layer"? This could hurt performance a lot in the beginning. Q3: "What role does the ... regularization term play ... compared with FRCL"? Q4: "Is it possible to do task detection?"
abd1c782880cc59759f4112fda0b8f98-AuthorFeedback.pdf
We thank the reviewers for their feedback and time! We are encouraged they found our theoretical results "impressive" Large batchsizes help us to obtain complexity guarantees beating the state-of-the-art ones. We can add these details to the main body using an additional 9th page. We agree with this criticism. We will try to test our methods on this task and investigate the heavy-tailedness of stochastic gradients for this problem. Simsekli et al. focus on non-convex problems and rates of convergence in expectation .